49 research outputs found
Knowledge-Augmented Language Model and its Application to Unsupervised Named-Entity Recognition
Traditional language models are unable to efficiently model entity names
observed in text. All but the most popular named entities appear infrequently
in text providing insufficient context. Recent efforts have recognized that
context can be generalized between entity names that share the same type (e.g.,
\emph{person} or \emph{location}) and have equipped language models with access
to an external knowledge base (KB). Our Knowledge-Augmented Language Model
(KALM) continues this line of work by augmenting a traditional model with a KB.
Unlike previous methods, however, we train with an end-to-end predictive
objective optimizing the perplexity of text. We do not require any additional
information such as named entity tags. In addition to improving language
modeling performance, KALM learns to recognize named entities in an entirely
unsupervised way by using entity type information latent in the model. On a
Named Entity Recognition (NER) task, KALM achieves performance comparable with
state-of-the-art supervised models. Our work demonstrates that named entities
(and possibly other types of world knowledge) can be modeled successfully using
predictive learning and training on large corpora of text without any
additional information.Comment: NAACL 2019; updated to cite Zhou et al. (2018) EMNLP as a piece of
related wor
Conundrums in noun phrase coreference resolution: making sense of the state-of-the-art
Journal ArticleWe aim to shed light on the state-of-the-art in NP coreference resolution by teasing apart the differences in the MUC and ACE task definitions, the assumptions made in evaluation methodologies, and inherent differences in text corpora. First, we examine three subproblems that play a role in coreference resolution: named entity recognition, anaphoricity determination, and coreference element detection. We measure the impact of each subproblem on coreference resolution and confirm that certain assumptions regarding these subproblems in the evaluation methodology can dramatically simplify the overall task. Second, we measure the performance of a state-of-the-art coreference resolver on several classes of anaphora and use these results to develop a quantitative measure for estimating coreference resolution performance on new data sets
System for studying the parameters of gas solenoid valves
The aim of the present work is to construct a test stand for determining the characteristics of different fourth generation gas injectors working under various conditions as close as possible to the actual operating ones. For this purpose, the standard fourth generation gas system and liquefied petroleum gas (LPG) as a working fluid were used for the stand. A system has been developed to maintain the gas leakage pressure equal in value to the pressure in the intake manifold of a Spark Ignition (SI) engine. Used LPG is compressed and liquefied for reuse. Additionally, safety measures are taken. The stand provides the right conditions for determining the influence of the nozzle diameter, the length of the connecting pipe between the injector and the intake manifold, the differential pressure upstream and downstream of the injector and other factors that affect these characteristics, which may be different when installing LPG system to an internal combustion engine
Towards A Unified View of Sparse Feed-Forward Network in Pretraining Large Language Model
Large and sparse feed-forward layers (S-FFN) such as Mixture-of-Experts (MoE)
have proven effective in scaling up Transformers model size for
\textit{pretraining} large language models. By only activating part of the FFN
parameters conditioning on input, S-FFN improves generalization performance
while keeping training and inference costs (in FLOPs) fixed. In this work, we
analyzed two major design choices of S-FFN: the memory block (a.k.a. expert)
size and the memory block selection method under a general conceptual framework
of sparse neural memory. Using this unified framework, we compare several S-FFN
architectures for language modeling and provide insights into their relative
efficacy and efficiency. We found a simpler selection method --
\textbf{\texttt{Avg-K}} that selects blocks through their mean aggregated
hidden states, achieving lower perplexity in language model pretraining
compared to existing MoE architectures including Switch Transformer (Fedus et
al., 2021) and HashLayer (Roller et al., 2021).Comment: Accepted to EMNLP 202